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Abstract 



In dynamic graph algorithms the following provide-or-bound problem has to be 
solved quickly: Given a set S containing a subset R and a way of generating ran- 
dom elements from S testing for membership in R, either (i) provide an element 
of R or (ii) give a (small) upper bound on the size of R that holds with high prob- 
ability. We give an optimal algorithm for this problem. 

This algorithm improves the time per operation for various dyamic graph algo- 
rithms by a factor of O (log n) . For example, it improves the time per update for 
fully dynamic connectivity from 0(log 3 71) to 0(log 2 n). 



1 Introduction 



In this paper we present a new sampling lemma, and use it to improve the running 
times of various fully dynamic graph algorithms. 

We consider the following provide-or-bound problem: Let 5 be a set with a 
subset R C S. Membership in R can be tested efficiently. For a given parameter 
r > 1, either 

(i) provide an element of R, or 

(ii) guarantee with high probability that the ratio |-R|/|5| is less than 1/r, that 
isthatr-l^l < \S\. 

This problem arises in the currently fastest fully dynamic graph algorithms 
for various problems on graphs, such as connectivity, two-edge connectivity, k- 
weight minimum spanning tree, (1 + e) -approximate minimum spanning tree, and 
bipartiteness-testing [6]. The connection is made specific in Section 2. 

In [6], Henzinger and King solve the problem by sampling 0(r log \ S\) ele- 
ments from S, returning any element found from R. This gives an Monte-Carlo 
algorithm whose type (ii) answer is false with probability l/ISI 0 ^ 1 ). In this paper, 
we give a randomized Monte-Carlo algorithm for which the expected number of 
random samples from S is 0(r). To be precise, we show the following lemma. 

Sampling Lemma Let R be a subset of a nonempty set S, and let r,c 6 3?>i. 
Set s = \S\. Then there is an algorithm with one of two outcomes: 

Case (i) Provide: It returns an element from R. 

Case (ii) Bound: It outputs the possibly false statement "\R\/\S\ < 1/r" with 
error probability less than exp( — s/ (rc)). 

The expected number of samples attributable to a type ( i) outcome is 0(r), and the 
worst-case number of samples attributable to a type (ii) outcome is 0(s/c). 

The bounds in case (i) and case (ii) are asymptotically optimal. Case (i) is 
optimal since it covers the case = 1/r. For case (ii), note that if x elements 

from S are sampled randomly and no element of R is found, then the probability 
that < 1/r is approximately exp(— x/r). Thus, picking 0{s/c) random 

elements is asymptotically optimal for achieving a bound of exp(— s/rc) on the 
error probability. 

We prove the sampling lemma in Section 3, in which we first prove a simpler 
lemma achieving an expected cost of O (log log | S \ ■ r) for case (i). This is already a 
substantial improvement over the 0(log \ S\ ■ r) obtained in [6]. We then bootstrap 
the technique, giving the desired cost of 0(7"). 
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1.1 Applications 

The prime application of our sampling lemma is to speed up fully dynamic graph 
connectivity, which is the problem of maintaining a graph under edge insertions 
and deletions. Queries on the connectivity between specified vertices should be 
answered efficiently. 

In the literature, fully dynamic graph algorithms are compared using the cost 
per insert, delete, and query operation. The best deterministic algorithms for 
fully dynamic graph connectivity take time 0(y/n) per update operation and 0(1) 
per query [2, 3, 5]. Recently, Henzinger and King gave a fully dynamic con- 
nectivity algorithm with O (log 3 n) expected amortized time per operation using 
Las-Vegas randomization [6]. This should be compared with a lower bound of 
Q(log n/ log log n) in the cell probe model [4, 8]. 

In this papter, we prove a sampling lemma and use it to reduce the bound above 
to O (log 2 n). 

Henzinger and King show that their approach applies to several other fully 
dynamic graph problems, for which we also get improved running times. Thus we 
get 

• O (log 3 n) expected time per operation to maintain the bridges in a graph 
(the 2-edge connectivity problem); 

• O ( k log 2 n) to maintain a minimum spanning tree in a graph with k different 
weights; 

• 0(log 2 n\og U/e) to maintain a spanning tree whose weight is a (1 + e)- 
approximation of the weight of the minimum spanning tree, where U is the 
maximum weight in the graph, 

• 0(log 2 n) to test if the graph is bipartite, and 

• O (log 2 n) to test if whether two edges are cycle-equivalent. 

2 Improved sampling in fully dynamic graph connectivity 

Our results for fully dynamic graph algorithms are achieved by locally improving a 
certain sampling bottleneck in the approach by Henzinger and King [6], henceforth 
referred to as the HK-approach. Rather than repeating their whole construction, we 
will confine ourselves to a self-contained description of this bottleneck, focussing 
on connectivity. Our technique for the bottleneck is of a general flavor and we 
expect it to be applicable in other contexts. 
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Consider the problem of maintaining a spanning tree T of some connected 
graph G = (V, E), n = \V\. If some tree edge e is deleted from T, we get two 
sub-trees T\ and T 2 . Let R be the set of non-tree edges with end -points in both 
T\ and T 2 . Then R is exactly the set of edges / that can replace e in the sense 
that T U {/} \ {e} is a spanning tree of G. Our general goal is to find such a 
replacement edge / 6 R. Alternatively, it is acceptable to discover that R is sparse 
in the following sense: Let S be the set of non-tree edges incident to T\. Then 
R C S, and we say that R is sparse if 

r\R\ < | SI, where r = ©(log n). 

Otherwise R is said to be dense. 

Given an algorithm that either (a) provides a replacement edge at expected cost 
t(n), or (b) discovers that R is sparse at cost 0(t(n) + |S|), the amortized expected 
operation cost of Henzinger and King's fully dynamic connectivity algorithm is 
0(t(n) + log 2 n). 

Using the data structures from the HK-approach, edges from S can be sampled 
and tested for membership in R in time O(logn). Also, in time 0(|S|), we can 
scan all of S, identifying all the edges in R. 

The HK-approach achieves t(n) = 0(log 3 n) as follows. First, 2r In n random 
edges from S are sampled. If the sampling successfully finds an edge from R, this 
edge is returned, as in (a). Otherwise, hoping for (b), in time 0(|S|), a complete 
scan of S is performed, identifying all edges of R. If it turns out, however, that R 
is dense, an edge from R is returned as in (a). The probability of this "mistake" 
is the probability of not finding a replacement edge in 2r In n samples despite R 
being dense, which is 

< (1 - l/r-)) 2rlnn < 1/n 2 =0(1/|S|), 

Thus, the expected cost of a mistake is 0((log 3 n + |S|)/|S|). Adding up, Hen- 
zinger and King get t(n) = 0(log 3 n), which is hence the expected amortized 
operation cost for their fully dynamic connectivity algorithm. 

We achieve t(n) = log 2 n by applying our sampling lemma with c = In n 
and r = 0(logn). Then, in case (i) of the lemma, we find an element from 
R at expected cost O (log 2 71) . In case (ii), the cost is O (log n ■ \ S \ / log n) = 
0(|S|), matching the cost of a subsequent scanning. According to the lemma, the 
probability that R turns out to be dense is exp ( - 1 S | /r c) = exp ( - 1 S | /O (log 2 n) ) , 
so the expected contribution from such a mistaken scan is 

0(|S| ex P (-|S|/0(log 2 n))) = 0(log 2 n). 
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Thus, we get t(n) = 0(log 2 71), which is hence the new expected amortized oper- 
ation cost for fully dynamic connectivity. 

All our other results for fully dynamic graph algorithms are achieved by the 
same local improvement. 

3 The sampling lemma 

The HK-approach solves the provide-or-bound problem as follows: 
Algorithm A: 

A.l. Let So be a random subset of S of size r In s. 

A.2. #0 := S 0 n R. 

A3. If R 0 ^ 0, then return x 6 R 0 . 

A.4. Print "\R\/\S\ < 1/r with probability > 1 - 1/s." 

Thus, the algorithm provides the first element of R that it finds. Only if it does not 
find one, does it give a bound on the size of R. Recall from the sampling lemma 
that we are willing to pay more for a bound on the size of R than for an element 
of R. Suppose that we have made many samples from S and that we have only 
found one or a few elements from R. Even if our sample size is not big enough 
for the desired high probability bound on R, it may still be fair to hypothesize that 
R is small. Instead of just returning the element from R, based on the hypothesis, 
we should rather continue sampling until we reach a sample size big enough for 
the desired probability bound on R. The probability that the continued sampling 
contradicts our hypothesis that R is small should be low, so that the expected cost 
of such a mistake is low. 

We approximate this approach using a step function: To demonstrate a simpli- 
fied version of our technique, we first show a weaker lemma in Section 3.1 using 
an algorithm with two rounds of sampling and bounding. In Section 3.2 we use 
log* s rounds to prove the sampling lemma. 

In this section, we make repeated use of the following Chernoff bounds (see [1], 
for example): Let B(n,p) be a random variable that has a binomial distribution 
with parameters n and p. Then for 8 < 1, 



Pv(\B(n.p)\>(l + 5)E(\B(n,p)\))<e 
Pv(\B(n.p)\<(l-5)E(\B(n,p)\))<e 



S 2 E(\B(n,p)\)/3 



(1) 
(2) 



S 2 E(\B(n,p)\)/2 
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3.1 Sampling in two rounds 

Lemma 1 Let R be a subset of a nonempty set S, and let r 6 3?>i. Set s = \S\. 
Then there is an algorithm with one of two outcomes: 

Case (i) Provide: It returns an element from R. 

Case (ii) Bound: It outputs the possibly false statement "\R\/\S\ < 1/r" with 
error probability less than 1/s. 

The expected number of samples attributable to a type (i) outcome is 4?" (In In s + 
2), and the worst-case number of samples attributable to a type (ii) outcome is 
8r In s + 4r In In s. 

Proof: The idea is the following: Instead of just sampling 0(r log s) elements 
returning any element from R, we first make an initial round, where we sample 
0(r log log s) elements. If an element from R is found, we just return it; otherwise, 
we believe that R is sparse, in other words that |-R|/|5| < 1/r. In fact, with ap- 
propriately chosen constants, we conclude with error probability 0(1/ log s), that 
|ir!|/|£| < 1/(47"). We now have a confirming round, where we sample 0(r log s) 
elements. If the proportion of elements from R in this sample is < 1/(27"), then 
using Chernoff bounds, we conclude that |-R|/|5| < 1/r with error probability 
< 1/s. We have a contradiction to the hypothesis that R is sparse otherwise and 
we return one of the elements of R found in the confirming round. However, us- 
ing Chernoff bounds, we can show that the probability of entering the confirming 
round and finding a ratio > 1/(27") is 0(1/ log s), giving an expected cost of 0(r) 
for contradicting the confirming round. 

We are now ready to formally present an algorithm with the properties de- 
scribed in Lemma 1 . 

Algorithm B: 

B.l. Let So be a random subset of S of size 4r In In s. 

B.2. #0 := S 0 n R. 

B.3. If #0 ^ 0, then return x 6 #0. 

B.4. Let Si be a random subset of S of size 8?" In s. 

B.5. #1 := Si n R. 

B.6. If \R\\ > 4 In s, then return any x 6 R±. 

B.7. Print "\R\/\S\ < 1/r with probability > 1 - 1/s." 
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We show next a bound on the probability p that the algorithm returns an element 
from R in B.6 (Claim 1A), that is the initial guess of sparsity is not confirmed. 
Afterwards we prove that the Algorithm B satisfies the conditions of Lemma 1 . 

CLAIM 1A The probability p that the algorithm returns an element from R is < 
1/ln s. 

PROOF: We consider two cases: 

Case 1: \R\/\S\ > l/(4r). The algorithm did not return in B. 3, so 
p < (1 - l/(4r-)) 4rlnlns < e " lnlns = 1/lns 

Case 2: \R\/\S\ < l/(4r). Then the expected value of \Rx\ is at most 2 In s. But 

p < Pr^Ril > 41ns) < Pr^R^ > 2E(\R 1 \)) < g-^d^il)/ 3 < 1/lns 

The second inequality follows by Chernoff bound (1). The last inequality 
is trivially satisfied for In s < 1. Otherwise, since x/ In x > e for any real 
x > 1, we have 2(ln s) /3 > 2e(ln In s) /3 > In In s . 

□ 

We are now ready to show that Algorithm B satisfies the conditions of Lemma 

1. 

Case (i) First, we determine the expected number of samples if the algorithm 
returns an element from R. By Claim 1A, the probability p that the algorithm 
returns an element from R in Step B.6 is bounded by 1/ln s. Thus, the expected 
number of samples is 

4r In In s + 8r In s/ In s = 4r(ln In s + 2) 

Case (ii) Second, we consider the case when the algorithm does not return an 
element from R, in other words when the conditions in Steps B.3 and B.6 are not 
satisfied. We want to show that the probability of this case is at most 1/s. 

Suppose > 1/r. We did not return an element from R in Step B.6, 

so |i?i| < 4 In s. However, the expected value of \Ri \ is at least 4 In s. Thus, by 
Chernoff bound (2), the probability that Rx is less than E(\R 1 \)/2 is bounded by 
1/s. 

PrQR.l < E(\R 1 \)/2) < e - B (l*iD/ 8 = e -(i/2) 2 8in S /2 = 1/s 

■ 

In the next section we show the general sampling lemma. 
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3.2 Sampling in many rounds 



In this section, we will prove the sampling lemma restated below. 

Lemma 2 Let R be a subset of a nonempty set S, and let r,c 6 3?>i. Set s = |5|. 

r/ie« there is an algorithm with one of two outcomes: 

Case (i) Provide: It returns an element from R. 

Case (ii) Bound: It outputs the possibly false statement "\R\/\S\ < 1/r" with 
error probability less than exp( — s/ (re)). 

The expected number of samples attributable to a type ( i) outcome is 0{r), and the 
worst-case number of samples attributable to a type (ii) outcome is 0(s/c). 

Proof: We will now generalize the construction from the previous section to work 
with a sequence of confirming rounds, i = 1, . . . , 0(log* s). In round i, we will 
pick Vifii random elements from S, and if at least n-i elements from R are found, 
one of these is returned. For the initial round 0, n 0 = 1, that is, any element from 
R is returned. In the subsequent confirming rounds, the numbers n-i of elements of 
R increase in order to increase our confidence. At the same time, the thresholds 
1 /ri are increased, in order to minimize the probability that the threshold is passed 
in a later round. The concrete values of the rii and 7"; are fine tuned relative to the 
Chernoff bounds that we use to calculate our probabilities. 

Let the increasing sequence n 0 , ri\ . . . be defined such that n 0 = 1 and for 
i > 0, rii = a4 l (i + 3), where a = 64 In 16 < 178. Let the decreasing sequence 
r 0 , ri, ... be defined such that r 0 = ln(2?ii)2er % Mr and for i > 0, = 
2er/nj- =1 (l + 1/ 2J )- Since e > ]l£Li(l + 1/2 J ), r { is larger than 2r. 

i oo 

n = 2er/ JJ (1 + 1/2 J ) > 2r JJ (1 + 1/2 J ) > 2r 
j=i j=*+i 

We are now ready to present an algorithm satisfying the conditions of Lemma 2. 

Algorithm C: 

C.l. i := 0;5_i := 0; 
C.2. While r,n, < 8s/c: 

C.2. 1 . Construct 5; adding random elements from 5 to i until 1 5; | = 7";72,; . 
C.2.2. := S l n 

C.2.3. If \Ri\ > rii, then return x G 5i n R 
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C.2.4. i:=i+l; 

C.3. Let S{ be a random subset of S of size 8s/c. 
C.4. Ri := 5, n i2. 

C.5. If \Ri\ > 4s/(cr), then return x 6 S; n it!. 

C.6. Print < 1/r with probability > 1 - exp(-s/rc)." 

For i > 0, let pi be the probability that the algorithm returns an element from R in 
round i. Here the round refers to the value of i in Step C.2.3 or C.5. 

CLAIM 2 A For all i > 1, f/ie probability pi that the algorithm returns an element 
from R is at least 1/(tj,;2 1 ). 

PROOF: Consider the following simplified algorithm D: Pick Vifii random ele- 
ments from S. If at least rii elements belong to R, return one of them, otherwise 
do no return any element. 

We show below that the probability pu that algorithm D returns an element of 
R is at most 1/(^2*). Now notice that the probability that algorithm C returns an 
element in round i is at most pu, since this event happens only if Si contains at 
least rii elements from R and none of the previous rounds returned an element of 
R. Thus, the lemma follows. 

To show the bound on pu, note first that 

l/{ ni 2 l ) = l/{aA\i + 3)2 l ) = l/(a8 l (i + 3)) > 1/(178 • 8 l (i + 3)) 
We consider two cases: 

Case 1: \R\/\S\ > (1 + l/2 l+1 )/((l+ l/2 l )r- l ): First consider the case where i = 
1. Then \R\/\S\ > (1 + l/2 2 )/((l + l/2)ri) = (1 + l/4)/2er > l/2er. 
Let b be ln(2wi). We did not find any element from R in any of the 2ber 
samples in round 0, so 

PD < (1 - \R\/\S\) 2ber < (1 - l/2er) 2ber < e~ b = l/(2m) 

Now suppose i > 1. Then \R\/\S\ > (1 + l/2 l+1 )/((l + l/2 l )r- l ) = 
(1 + l/2 l+1 )/7";_i. In round i — 1 we did not return, so is less than 

x = fii—i . However, the expected value of is 

H = r i - 1 m- 1 \R\/\S\ > ^_i(l + l/2 l+1 ) 

By Chernoff bound (2), 

PD < Prd^-il < x) < 6 -(m-^) 2 /(2m) 
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For \l > 7ii_i(l + l/2 l+1 ), we get 
p n <expf -("^/ 2i+1 ) 2 

S eX P^2n i _ 1 (l + l/2'+l) 



< exp(-^_ 1 /2 2l + 4 ) 

= 16"( l+2 ) < 1/(178 • 8 l (i + 3)) < 1/(^2*) (Vi > 1) 

Case2: \R\/\S\ < (1 + l/2 l+1 )/((l + l/2 l )r- l ): First suppose that we are return- 
ing in Step C.2.3. Then |ir!;| is at least x = n{. However, the expected value 
fi of | it!; | is at most 

Tnn(l + l/2 l+1 )/((l + 1/2%) = m(l - l/(2 l + 1) 

< m(i - 1/T+ 2 ) 

By Chernoff bound (1), 

PD < Pr(\Ri\ >x)< e -(^-^) 2 /(3M) 
For \l < 71,(1 - l/2 l+2 ) we get 

. ( -(n i /2 i + 2 ) 2 ^ 

< exp(-^/(3 • 4 l + 2 )) 

< 4Q-( l + 3 ) < 1/(178 • 8 l (i + 3)) < 1/(^2*) (Vi > 1) 



Next suppose that we are returning in Step C.5. Then \Ri\ is at least x = 
4s/(cr) and the expected value \i is at most (8s/c)(l + l/2 l+1 )/7-;_i = 
x(l + l/2 l+1 )27"/7";_i Recall that the rj were chosen so that 

2—1 OO 

ri_!/2r = e/ JJ(1 + 1/2') > JJ(1 + 1/2''). 

Hence 

/i< x(l + l/2 l+1 )2r/r l _ 1 

< x(l + l/2 l + 1 )/((l + 1/2*)(1 + ■ • •) 

< x(l - l/2 l+1 ) 
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Note that x > rii-iri-i/ri > ni—\ since 8s /c > 7";_i72,;_i. Thus, we get 
the desired bound on pp. 

, ( -(x/2 i + 1 ) 2 ^ 

PD< exp U(i-i/2'+i)) J 

< exp(-z/(3 • 2 2l + 2 )) 

< 40"( l + 2 ) < 1/(178 • 8 l (i + 3)) < 1/(^2*) (Vi > 1) 

□ 

We are now ready to show that the Algorithm C satisfies the conditions of Lemma 
2. 

Case (i) First, we analyze the expected number of samples attributable to a type 
(i) outcome. By Claim 2A, for i > 0, the probability pi that the algorithm returns 
an element from R in round i is bounded by 1/(^2*). The expected number of 
samples is thus bounded by 54r. 

+ E~ i PiTirii 

<r 0 + E l =i rim/in^) 

<r 0 + EZi 2er/T 

= r"o + 2er < 54r 

Case (ii) Second, we consider the case that the algorithm does not return an ele- 
ment from R, in other words that the conditions in Steps C.2.3 and C.5 are never 
satisfied. Then, the total sample size is 8s/c. 

Suppose > 1/r. We did not return an element from R in Step C.5, 

so X = \Ri\ is less than x = 4s /(cr). However, the expected value fi of |ir!;| is 
at least 8s/ (cr). The probability p is now calculated as in Case 1 of the proof of 
Claim 2A. 

V < <-^'<™ < "P(^jf) < exp(- S /(-)) 
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